{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 特征选择"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "因为在注塑成型、负载预测过程中，面临了特征选择的困境，即使有了baseline,最终的解决方案的特征一般都很少，这说明特征选择非常重要！！\n",
    "下面主要收集常见的特征选择的方式。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### sklearn\n",
    "- 提供的3中  过滤（方差）、迭代（前向、后向）、嵌入方式（L1L2系数）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 过滤"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 迭代\n",
    "- 递归特征选择 ，每次去除最不重要的n个特征，知道特征个数满足设定值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.feature_selection import RFE\n",
    "from lightgbm import LGBMRegressor\n",
    "# rfe = RFE(estimator=LGBMRegressor(n_estimators=1000), n_features_to_select=20, step=0.1,verbose=1)\n",
    "# rfe.fit(X, y)\n",
    "#rfe.ranking_\n",
    "# data[X.columns[rfe.support_]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 嵌入式lasso"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "#构造数据\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from sklearn.datasets import load_boston\n",
    "\n",
    "X,y = load_boston(return_X_y = True)\n",
    "X = pd.DataFrame(X,columns=['col_'+ str(i) for i in range(X.shape[1]) ])\n",
    "y = pd.DataFrame(y,columns=['target'])\n",
    "\n",
    "#构建一个训练集和测试集，测试集是没有target的\n",
    "train = pd.concat([X,y],axis=1)\n",
    "test = X.iloc[:10,:]\n",
    "\n",
    "# train + taget\n",
    "# test\n",
    "\n",
    "x_train = X.copy()\n",
    "y_train = y.copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "所选择的特征: Index(['col_3', 'col_4', 'col_5', 'col_7'], dtype='object')\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA3oAAAHSCAYAAAC6g7nSAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO3de7RlZXkv6N/LJRC8oODlEBEKc0iiCFXBbZSArSlO0AjinVQGCYgnEm+BnBHoEPHYaEIG0ZgEhFxIBIsakBjpQOjm2NGUEUNoPdmEaykG7S5zQNpw0YpI5EDx9h+1wLKyq/auvar2Ze7nGYOx15rft775LmbNVftX37fmrO4OAAAAw7HLfBcAAADAjiXoAQAADIygBwAAMDCCHgAAwMAIegAAAAMj6AEAAAzMbvNdwGw94xnP6GXLls13GQAAAPPixhtvvK+7nzlV26INesuWLcvk5OR8lwEAADAvquprW2uzdBMAAGBgBD0AAICBEfQAAAAGRtADAAAYGEEPAABgYAQ9AACAgVm0t1e47e4NWXbWtfNdBgAAMFDrzzt2vkuYNTN6AAAAAyPoAQAADIygBwAAMDCCHgAAwMAIegAAAAMzJ0Gvqs6pqjNm0O+MquqqesZc1AUAADBEC2ZGr6qem+Snk/zzfNcCAACwmI0V9KrqpKq6tapuqao1VXVgVa0dbVtbVQdsx3C/l+R/TdLj1AQAALDUzTroVdUhSc5OsrK7lyc5PcmFSS7r7sOSXJ7kghmOdXySu7v7lmn6nVpVk1U1ufGhDbMtHQAAYNDGmdFbmeTK7r4vSbr7gSRHJLli1L4myVHTDVJVe2VTYHzfdH27++LunujuiV332nvWhQMAAAzZOEGvMv0yy5ksw/zhJAcluaWq1ifZP8k/VtV/GKM2AACAJWucoLc2yQlVtW+SVNU+SW5IsmrUfmKS66cbpLtv6+5ndfey7l6W5K4kh3f3/zdGbQAAAEvWbrN9YXevq6pzk1xXVRuT3JTktCSXVNWZSe5NcsqOKRMAAICZmnXQS5LuXp1k9RabV07R75ztGHPZODUBAAAsdQvmPnoAAADsGGPN6G2vqrooyZFbbD6/uy+dyzoAAACGrLoX5/3JJyYmenJycr7LAAAAmBdVdWN3T0zVZukmAADAwAh6AAAAAyPoAQAADIygBwAAMDCCHgAAwMAIegAAAAMj6AEAAAyMoAcAADAwgh4AAMDACHoAAAADI+gBAAAMjKAHAAAwMIIeAADAwAh6AAAAAyPoAQAADIygBwAAMDC7zXcBs3Xb3Ruy7Kxr57sM2G7rzzt2vksAAGDgzOgBAAAMjKAHAAAwMIIeAADAwAh6AAAAAzMnQa+qzqmqM6Zpv7uqbh799+q5qAsAAGCIFtJVN3+vu39nvosAAABY7Maa0auqk6rq1qq6parWVNWBVbV2tG1tVR2wowoFAABgZmYd9KrqkCRnJ1nZ3cuTnJ7kwiSXdfdhSS5PcsF2DPnuUUC8pKqevpV9nlpVk1U1ufGhDbMtHQAAYNDGmdFbmeTK7r4vSbr7gSRHJLli1L4myVEzHOsPk/xwkhVJ7kny4ak6dffF3T3R3RO77rX3GKUDAAAM1zhBr5L0NH2ma9/Uqfsb3b2xux9L8idJfmKMugAAAJa0cYLe2iQnVNW+SVJV+yS5IcmqUfuJSa6fyUBVtd9mT1+f5PYx6gIAAFjSZn3Vze5eV1XnJrmuqjYmuSnJaUkuqaozk9yb5JQZDvfBqlqRTTOA65P80mzrAgAAWOrGur1Cd69OsnqLzSun6HfONOP8wjh1AAAA8D1zcsN0AAAA5s6c3jC9qi5KcuQWm8/v7kvnsg4AAIAhq+4ZXRhzwZmYmOjJycn5LgMAAGBeVNWN3T0xVZulmwAAAAMj6AEAAAyMoAcAADAwgh4AAMDACHoAAAADI+gBAAAMjKAHAAAwMIIeAADAwAh6AAAAAyPoAQAADIygBwAAMDCCHgAAwMAIegAAAAMj6AEAAAyMoAcAADAwgh4AAMDA7DbfBczWbXdvyLKzrp3vMtjJ1p937HyXAAAAi44ZPQAAgIER9AAAAAZG0AMAABgYQQ8AAGBgBD0AAICBmZOgV1XnVNUZ22j/jaq6tapurqpPVdUPzUVdAAAAQ7RQZvQ+1N2HdfeKJP9nkvfNd0EAAACL1VhBr6pOGs3E3VJVa6rqwKpaO9q2tqoOmMk43f2vmz19UpIepy4AAIClbNY3TK+qQ5KcneTI7r6vqvZJsjrJZd29uqremuSCJK+b4XjnJjkpyYYkP7WVPqcmOTVJdn3qM2dbOgAAwKCNM6O3MsmV3X1fknT3A0mOSHLFqH1NkqNmOlh3n93dz01yeZJ3b6XPxd090d0Tu+619xilAwAADNc4Qa8y/RLL2SzBvCLJG2fxOgAAADJe0Fub5ISq2jdJRks3b0iyatR+YpLrZzJQVR282dPjk9wxRl0AAABL2qy/o9fd60bfq7uuqjYmuSnJaUkuqaozk9yb5JQZDndeVf1okseSfC3J22dbFwAAwFI366CXJN29OpsuwLK5lVP0O2eacSzVBAAA2EEWyn30AAAA2EHGmtHbXlV1UZIjt9h8fndfOpd1AAAADFl1L857k09MTPTk5OR8lwEAADAvqurG7p6Yqs3STQAAgIER9AAAAAZG0AMAABgYQQ8AAGBgBD0AAICBEfQAAAAGRtADAAAYGEEPAABgYAQ9AACAgRH0AAAABkbQAwAAGBhBDwAAYGAEPQAAgIER9AAAAAZG0AMAABgYQQ8AAGBgdpvvAmbrtrs3ZNlZ1853GQvG+vOOne8SAACABcKMHgAAwMAIegAAAAMj6AEAAAyMoAcAADAwc3Ixlqo6J8mD3f07W2n/eJIfHT19WpJvdfeKuagNAABgaBbEVTe7+2cff1xVH06yYR7LAQAAWNTGWrpZVSdV1a1VdUtVramqA6tq7Wjb2qo6YDvHqyQnJPmzceoCAABYymYd9KrqkCRnJ1nZ3cuTnJ7kwiSXdfdhSS5PcsF2DvuyJN/o7ju3ss9Tq2qyqiY3PmTSDwAAYCrjzOitTHJld9+XJN39QJIjklwxal+T5KjtHPPnso3ZvO6+uLsnunti1732nkXJAAAAwzfOd/QqSU/TZ7r27w1WtVuSNyR50Rg1AQAALHnjzOitTXJCVe2bJFW1T5IbkqwatZ+Y5PrtGO8/Jbmju+8aoyYAAIAlb9Yzet29rqrOTXJdVW1MclOS05JcUlVnJrk3ySnbMeSquAgLAADA2Ma6vUJ3r06yeovNK6fod84MxnrLOLUAAACwyVi3VwAAAGDhmdMbplfVRUmO3GLz+d196VzWAQAAMGTVPeMLYy4oExMTPTk5Od9lAAAAzIuqurG7J6Zqs3QTAABgYAQ9AACAgRH0AAAABkbQAwAAGBhBDwAAYGAEPQAAgIER9AAAAAZG0AMAABgYQQ8AAGBgBD0AAICBEfQAAAAGRtADAAAYGEEPAABgYAQ9AACAgRH0AAAABkbQAwAAGJjd5ruA2brt7g1Zdta1813GjK0/79j5LgEAAFgizOgBAAAMjKAHAAAwMIIeAADAwAh6AAAAAyPoAQAADMwODXpVdU5VnbGN9jdX1bqqeqyqJrZo+/Wq+kpVfbmqXrkj6wIAAFhK5vr2CrcneUOSP958Y1W9IMmqJIck+aEkf1NVP9LdG+e4PgAAgEVvRjN6VXVSVd1aVbdU1ZqqOrCq1o62ra2qA2YyTnd/qbu/PEXTa5P8eXc/3N3/b5KvJPmJKeo4taomq2py40MbZrJLAACAJWfaoFdVhyQ5O8nK7l6e5PQkFya5rLsPS3J5kgvGrOM5Sf7HZs/vGm37Pt19cXdPdPfErnvtPeYuAQAAhmkmM3ork1zZ3fclSXc/kOSIJFeM2tckOWrMOmqKbT3mmAAAAEvSTIJeZfrQNW4ouyvJczd7vn+Sr485JgAAwJI0k6C3NskJVbVvklTVPkluyKaLpyTJiUmuH7OOa5Ksqqo9quqgJAcn+e9jjgkAALAkTXvVze5eV1XnJrmuqjYmuSnJaUkuqaozk9yb5JSZ7KyqXp/kI0memeTaqrq5u1852sdfJPlikkeTvMsVNwEAAGanuhfnV+H22O/g3u/k35/vMmZs/XnHzncJAADAgFTVjd09MVXbDr1hOgAAAPNvp9wwvaouSnLkFpvP7+5Ld8b+AAAA+J5Fu3RzYmKiJycn57sMAACAeWHpJgAAwBIi6AEAAAyMoAcAADAwgh4AAMDACHoAAAADI+gBAAAMjKAHAAAwMIIeAADAwAh6AAAAAyPoAQAADIygBwAAMDCCHgAAwMAIegAAAAMj6AEAAAyMoAcAADAwgh4AAMDA7DbfBczWbXdvyLKzrt1p468/79idNjYAAMDOZEYPAABgYAQ9AACAgRH0AAAABkbQAwAAGJgdGvSq6pyqOmMb7W+uqnVV9VhVTWy2/aer6saqum30c+WOrAsAAGApmeurbt6e5A1J/niL7fcleU13f72qXpjkr5M8Z45rAwAAGIQZzehV1UlVdWtV3VJVa6rqwKpaO9q2tqoOmMk43f2l7v7yFNtv6u6vj56uS7JnVe0x87cBAADA46YNelV1SJKzk6zs7uVJTk9yYZLLuvuwJJcnuWAH1vTGJDd198NT1HJqVU1W1eTGhzbswF0CAAAMx0xm9FYmubK770uS7n4gyRFJrhi1r0ly1I4oZhQqfzvJL03V3t0Xd/dEd0/sutfeO2KXAAAAgzOToFdJepo+07VPv5Oq/ZNcleSk7v7quOMBAAAsVTMJemuTnFBV+yZJVe2T5IYkq0btJya5fpwiquppSa5N8uvd/ffjjAUAALDUTRv0untdknOTXFdVtyT53SSnJTmlqm5N8gvZ9L29aVXV66vqrmxa+nltVf31qOndSf5jkv9aVTeP/nvW9r8dAAAAqnvsVZfzYo/9Du79Tv79nTb++vOO3WljAwAAjKuqbuzuianadugN0wEAAJh/O+WG6VV1UZIjt9h8fndfujP2BwAAwPcs2qWbExMTPTk5Od9lAAAAzAtLNwEAAJYQQQ8AAGBgBD0AAICBEfQAAAAGRtADAAAYGEEPAABgYAQ9AACAgRH0AAAABkbQAwAAGBhBDwAAYGAEPQAAgIER9AAAAAZG0AMAABgYQQ8AAGBgBD0AAICBEfQAAAAGZrf5LmC2brt7Q5adde12v279ecfuhGoAAAAWDjN6AAAAAyPoAQAADIygBwAAMDCCHgAAwMAIegAAAAMzJ0Gvqs6pqjO20b6iqj5fVTdX1WRV/cRc1AUAADBEC2VG74NJ3t/dK5K8b/QcAACAWRgr6FXVSVV1a1XdUlVrqurAqlo72ra2qg6Y4VCd5Kmjx3sn+fpW9nfqaMZvcuNDG8YpHQAAYLBmfcP0qjokydlJjuzu+6pqnySrk1zW3aur6q1JLkjyuhkM9ytJ/rqqfiebwudPTtWpuy9OcnGS7LHfwT3b2gEAAIZsnBm9lUmu7O77kqS7H0hyRJIrRu1rkhw1w7HekeS/dPdzk/yXJB8doy4AAIAlbZygV9m05HJbZjrrdnKSvxw9/kQSF2MBAACYpXGC3tokJ1TVvkkyWrp5Q5JVo/YTk1w/w7G+nuTlo8crk9w5Rl0AAABL2qy/o9fd66rq3CTXVdXGJDclOS3JJVV1ZpJ7k5wyw+HeluT8qtotyXeTnDrbugAAAJa6WQe9JOnu1dl0AZbNrZyi3znTjHN9kheNUwsAAACbLJT76AEAALCDjDWjt72q6qIkR26x+fzuvnQu6wAAABiy6l6ct6ObmJjoycnJ+S4DAABgXlTVjd09MVWbpZsAAAADI+gBAAAMjKAHAAAwMIIeAADAwAh6AAAAAyPoAQAADIygBwAAMDCCHgAAwMAIegAAAAMj6AEAAAyMoAcAADAwgh4AAMDACHoAAAADI+gBAAAMjKAHAAAwMLvNdwGzddvdG7LsrGu/b9v6846dp2oAAAAWDjN6AAAAAyPoAQAADIygBwAAMDCCHgAAwMAIegAAAAMzJ0Gvqs6pqjO20b5PVX26qu4c/Xz6XNQFAAAwRAtlRu+sJGu7++Aka0fPAQAAmIWxgl5VnVRVt1bVLVW1pqoOrKq1o21rq+qAGQ712iSrR49XJ3ndOHUBAAAsZbMOelV1SJKzk6zs7uVJTk9yYZLLuvuwJJcnuWCGwz27u+9JktHPZ21ln6dW1WRVTW58aMNsSwcAABi0cWb0Via5srvvS5LufiDJEUmuGLWvSXLUeOV9v+6+uLsnunti17323pFDAwAADMY4Qa+S9DR9pmt/3Deqar8kGf38lzHqAgAAWNLGCXprk5xQVfsmm66cmeSGJKtG7ScmuX6GY12T5OTR45OT/NUYdQEAACxpu832hd29rqrOTXJdVW1MclOS05JcUlVnJrk3ySkzHO68JH9RVf85yT8nefNs6wIAAFjqZh30kqS7V+d7V8t83Mop+p0zzTj3Jzl6nFoAAADYZKHcRw8AAIAdZKwZve1VVRclOXKLzed396VzWQcAAMCQVfdML4y5sExMTPTk5OR8lwEAADAvqurG7p6Yqs3STQAAgIER9AAAAAZG0AMAABgYQQ8AAGBgBD0AAICBEfQAAAAGRtADAAAYGEEPAABgYAQ9AACAgRH0AAAABkbQAwAAGBhBDwAAYGAEPQAAgIER9AAAAAZG0AMAABgYQQ8AAGBgFm3Qu+3uDVl21rXzXQYAAMCCs2iDHgAAAFMT9AAAAAZG0AMAABgYQQ8AAGBg5iToVdU5VXXGNtrfXFXrquqxqpqYi5oAAACGaqHM6N2e5A1JPjffhQAAACx2YwW9qjqpqm6tqluqak1VHVhVa0fb1lbVATMZp7u/1N1fHqcWAAAANpl10KuqQ5KcnWRldy9PcnqSC5Nc1t2HJbk8yQU7pMrv7fPUqpqsqsmND23YkUMDAAAMxjgzeiuTXNnd9yVJdz+Q5IgkV4za1yQ5arzyvl93X9zdE909setee+/IoQEAAAZjnKBXSXqaPtO1AwAAsIONE/TWJjmhqvZNkqraJ8kNSVaN2k9Mcv145QEAALC9dpvtC7t7XVWdm+S6qtqY5KYkpyW5pKrOTHJvklNmMlZVvT7JR5I8M8m1VXVzd79ytrUBAAAsZbMOeknS3auTrN5i88op+p0zzThXJblqnFoAAADYZKHcRw8AAIAdZKwZve1VVRclOXKLzed396VzWQcAAMCQzWnQ6+53zeX+AAAAlqJFu3Tz0OfsnfXnHTvfZQAAACw4izboAQAAMDVBDwAAYGAEPQAAgIER9AAAAAZG0AMAABgYQQ8AAGBgBD0AAICBEfQAAAAGRtADAAAYGEEPAABgYAQ9AACAgRH0AAAABkbQAwAAGBhBDwAAYGAEPQAAgIFZtEHvtrs3zHcJAAAAC9KiDXoAAABMTdADAAAYGEEPAABgYAQ9AACAgRH0AAAABmZOgl5VnVNVZ2yj/UNVdUdV3VpVV1XV0+aiLgAAgCFaKDN6n07ywu4+LMk/Jfn1ea4HAABg0Ror6FXVSaNZuFuqak1VHVhVa0fb1lbVATMZp7s/1d2Pjp5+Psn+49QFAACwlM066FXVIUnOTrKyu5cnOT3JhUkuG83MXZ7kglkM/dYkn9zKPk+tqsmqmtz4kBumAwAATGWcGb2VSa7s7vuSpLsfSHJEkitG7WuSHLU9A1bV2UkezaaQ+O9098XdPdHdE7vutfesCwcAABiy3cZ4bSXpafpM1/69wapOTnJckqO7e8avAwAA4PuNM6O3NskJVbVvklTVPkluSLJq1H5ikutnMlBVvSrJryU5vrsfGqMmAACAJW/WM3rdva6qzk1yXVVtTHJTktOSXFJVZya5N8kpMxzuwiR7JPl0VSXJ57v77bOtDQAAYCmrxbpKco/9Du6H77lzvssAAACYF1V1Y3dPTNW2UO6jBwAAwA4yzsVYtltVXZTkyC02n9/dl85lHQAAAEM2p0Gvu9+1o8Y69DlurwAAADAVSzcBAAAGRtADAAAYGEEPAABgYAQ9AACAgRH0AAAABkbQAwAAGBhBDwAAYGAEPQAAgIER9AAAAAZG0AMAABgYQQ8AAGBgBD0AAICBEfQAAAAGRtADAAAYGEEPAABgYAQ9AACAgVm0Qe+2uzfMdwkAAAAL0qINegAAAExN0AMAABgYQQ8AAGBgBD0AAICBmZOgV1XnVNUZ22hfXlX/d1XdVlX/R1U9dS7qAgAAGKKFMqP3p0nO6u5Dk1yV5Mx5rgcAAGDRGivoVdVJVXVrVd1SVWuq6sCqWjvatraqDpjhUD+a5HOjx59O8sZx6gIAAFjKZh30quqQJGcnWdndy5OcnuTCJJd192FJLk9ywQyHuz3J8aPHb07y3K3s89SqmqyqyY0PuY8eAADAVMaZ0VuZ5Mruvi9JuvuBJEckuWLUvibJUTMc661J3lVVNyZ5SpL/OVWn7r64uye6e2LXvfYeo3QAAIDh2m2M11aSnqbPdO2bOnXfkeSYJKmqH0ly7Bh1AQAALGnjzOitTXJCVe2bJFW1T5IbkqwatZ+Y5PqZDFRVzxr93CXJe5P80Rh1AQAALGmzntHr7nVVdW6S66pqY5KbkpyW5JKqOjPJvUlOmeFwP1dV7xo9/sskl862LgAAgKWuume0unLB2WO/g/vhe+6c7zIAAADmRVXd2N0TU7UtlPvoAQAAsIOMczGW7VZVFyU5covN53e3pZoAAAA7yJwGve5+1/S9AAAAGMeiXbp56HPcRw8AAGAqizboAQAAMDVBDwAAYGAEPQAAgIER9AAAAAZG0AMAABgYQQ8AAGBgBD0AAICBEfQAAAAGRtADAAAYGEEPAABgYAQ9AACAgRH0AAAABkbQAwAAGBhBDwAAYGAEPQAAgIFZtEHvtrs3zHcJAAAAC9KiDXoAAABMTdADAAAYGEEPAABgYAQ9AACAgRH0AAAABmaHBr2qOqeqzthG+5ural1VPVZVE5tt37eq/raqHqyqC3dkTQAAAEvNXM/o3Z7kDUk+t8X27yb5r0m2GhIBAACYmRkFvao6qapurapbqmpNVR1YVWtH29ZW1QEzGae7v9TdX55i+3e6+/psCnwAAACMYdqgV1WHJDk7ycruXp7k9CQXJrmsuw9LcnmSC3Zqld+r5dSqmqyqyY0PuWE6AADAVGYyo7cyyZXdfV+SdPcDSY5IcsWofU2So3ZOed+vuy/u7onunth1r73nYpcAAACLzkyCXiXpafpM1w4AAMAcmUnQW5vkhKraN0mqap8kNyRZNWo/Mcn1O6c8AAAAttdu03Xo7nVVdW6S66pqY5KbkpyW5JKqOjPJvUlOmcnOqur1ST6S5JlJrq2qm7v7laO29UmemuQHqup1SY7p7i/O4j0BAAAsadW9OFdd7rHfwf3wPXfOdxkAAADzoqpu7O6Jqdrm+j56AAAA7GTTLt2cjaq6KMmRW2w+v7sv3Rn7AwAA4Ht2StDr7nftjHE3d+hz3F4BAABgKpZuAgAADIygBwAAMDCCHgAAwMAIegAAAAMj6AEAAAyMoAcAADAwgh4AAMDACHoAAAADI+gBAAAMjKAHAAAwMIIeAADAwAh6AAAAAyPoAQAADIygBwAAMDCCHgAAwMAIegAAAAMj6AEAAAyMoAcAADAwu813ATvSI488krvuuivf/e5357sUFpA999wz+++/f3bffff5LgUAAObEoILeXXfdlac85SlZtmxZqmq+y2EB6O7cf//9ueuuu3LQQQfNdzkAADAnBrV087vf/W723XdfIY8nVFX23Xdfs7wAACwpcxL0quqcqjpjmj6/XFVfrqp1VfXBMfY125cyUP5MAACw1CyIGb2q+qkkr01yWHcfkuR35rmkWdt1112zYsWKJ/5bv379do/xrW99K3/wB3+w44sb+djHPpZ3v/vdO238qVx99dX54he/OKf7BACApWqs7+hV1UlJzkjSSW5N8t4klyR5ZpJ7k5zS3f88g6HekeS87n44Sbr7X8ap63HLzrp2RwzzhPXnHTttnx/8wR/MzTffPNZ+Hg9673znO7frdRs3bsyuu+461r53hkcffTRXX311jjvuuLzgBS+Y73IAAGDwZj2jV1WHJDk7ycruXp7k9CQXJrmsuw9LcnmSC2Y43I8keVlVfaGqrquqF29ln6dW1WRVTd57772zLX3Obdy4MWeeeWZe/OIX57DDDssf//EfJ0kefPDBHH300Tn88MNz6KGH5q/+6q+SJGeddVa++tWvZsWKFTnzzDPz2c9+Nscdd9wT47373e/Oxz72sSTJsmXL8oEPfCBHHXVUPvGJT+SrX/1qXvWqV+VFL3pRXvayl+WOO+7YZm1vectb8o53vCM/9VM/lec973m57rrr8ta3vjXPf/7z85a3vOWJfk9+8pPzq7/6qzn88MNz9NFH5/H//zfffHNe+tKX5rDDDsvrX//6fPOb30ySvOIVr8h73vOevPzlL89v//Zv55prrsmZZ56ZFStW5Ktf/Wr+5E/+JC9+8YuzfPnyvPGNb8xDDz30RD2nnXZafvInfzLPe97zcuWVVz5Rwwc/+MEceuihWb58ec4666wk2e73CwAAS8E4M3ork1zZ3fclSXc/UFVHJHnDqH1Nkpl+1263JE9P8tIkL07yF1X1vO7uzTt198VJLk6SiYmJ/nejLAD/9m//lhUrViRJDjrooFx11VX56Ec/mr333jv/8A//kIcffjhHHnlkjjnmmDz3uc/NVVddlac+9am577778tKXvjTHH398zjvvvNx+++1PzAx+9rOf3eY+99xzz1x//fVJkqOPPjp/9Ed/lIMPPjhf+MIX8s53vjOf+cxntvn6b37zm/nMZz6Ta665Jq95zWvy93//9/nTP/3TvPjFL87NN9+cFStW5Dvf+U4OP/zwfPjDH84HPvCBvP/978+FF16Yk046KR/5yEfy8pe/PO973/vy/ve/P7//+7+fZNPM5HXXXZckufPOO3PcccflTW96U5LkaU97Wt72trclSd773vfmox/9aH75l385SXLPPffk+uuvzx133JHjjz8+b3rTm/LJT34yV199dXL49MwAAArwSURBVL7whS9kr732ygMPPJAkOfXUU7f7/QIAwNCNE/Qqm5ZsbstMw9hdSf5yFOz+e1U9luQZ2bT8c1GZaunmpz71qdx6661PzE5t2LAhd955Z/bff/+85z3vyec+97nssssuufvuu/ONb3xju/f5sz/7s0k2zRDecMMNefOb3/xE28MPPzzt61/zmtekqnLooYfm2c9+dg499NAkySGHHJL169dnxYoV2WWXXZ7Yz8///M/nDW94QzZs2JBvfetbefnLX54kOfnkk79v34/3n8rtt9+e9773vfnWt76VBx98MK985SufaHvd616XXXbZJS94wQue+P/xN3/zNznllFOy1157JUn22WefWb9fAAAYunGC3tokV1XV73X3/VW1T5IbkqzKptm8E5NcP8Oxrs6mGcLPVtWPJPmBJPeNUduC0t35yEc+8n1hJtl0UZR77703N954Y3bfffcsW7ZsytsA7LbbbnnssceeeL5lnyc96UlJksceeyxPe9rTtvs7gnvssUeSZJdddnni8ePPH3300SlfM5MrWT5e11Te8pa35Oqrr87y5cvzsY997PtmLTev4fFJ3e7+d/uc7fsFAIChm/V39Lp7XZJzk1xXVbck+d0kpyU5papuTfIL2fS9vZm4JMnzqur2JH+e5OQtl20uZq985Svzh3/4h3nkkUeSJP/0T/+U73znO9mwYUOe9axnZffdd8/f/u3f5mtf+1qS5ClPeUq+/e1vP/H6Aw88MF/84hfz8MMPZ8OGDVm7du2U+3nqU5+agw46KJ/4xCeSbApHt9xyyw55D4899tgTM5JXXHFFjjrqqOy99955+tOfnr/7u79LkqxZs+aJ2b0tbfmevv3tb2e//fbLI488kssvv3za/R9zzDG55JJLnvgu3wMPPLBT3y8AACxmY111s7tXJ1m9xeaVU/Q7Z5px/meSnx+nloXsF3/xF7N+/focfvjh6e4885nPzNVXX50TTzwxr3nNazIxMZEVK1bkx37sx5Ik++67b4488si88IUvzM/8zM/kQx/6UE444YQcdthhOfjgg/PjP/7jW93X5Zdfnne84x35zd/8zTzyyCNZtWpVli9fPvZ7eNKTnpR169blRS96Ufbee+98/OMfT5KsXr06b3/72/PQQw/lec97Xi699NIpX79q1aq87W1vywUXXJArr7wyv/Ebv5GXvOQlOfDAA3PooYd+Xwicyqte9arcfPPNmZiYyA/8wA/k1a9+dX7rt35rp71fAABYzGqxTpxNTEz05OTk92370pe+lOc///nzVNGwPfnJT86DDz4432XMmj8bAAAMTVXd2N0TU7WNNaM3i0IuSnLkFpvP7+6pp4EAAADYbnMa9Lr7XXO5P3acxTybBwAAS82sL8YCAADAwjS4oLdYv3PIzuPPBAAAS82ggt6ee+6Z+++/3y/2PKG7c//992fPPfec71IAAGDOzOl39Ha2/fffP3fddVfuvffe+S6FBWTPPffM/vvvP99lAADAnBlU0Nt9991z0EEHzXcZAAAA82pQSzcBAAAQ9AAAAAZH0AMAABiYWqxXqKyqbyf58nzXwQ7xjCT3zXcRjM1xHAbHcTgcy2FwHIfBcRyOhXYsD+zuZ07VsJgvxvLl7p6Y7yIYX1VNOpaLn+M4DI7jcDiWw+A4DoPjOByL6VhaugkAADAwgh4AAMDALOagd/F8F8AO41gOg+M4DI7jcDiWw+A4DoPjOByL5lgu2ouxAAAAMLXFPKMHAADAFBZ80KuqV1XVl6vqK1V11hTte1TVx0ftX6iqZXNfJdtSVc+tqr+tqi9V1bqqOn2KPq+oqg1VdfPov/fNR61Mr6rWV9Vto+M0OUV7VdUFo3Py1qo6fD7qZOuq6kc3O9durqp/rapf2aKPc3KBqqpLqupfqur2zbbtU1Wfrqo7Rz+fvpXXnjzqc2dVnTx3VbOlrRzHD1XVHaPPzquq6mlbee02P4eZO1s5judU1d2bfX6+eiuv3ebvuMytrRzLj292HNdX1c1bee2CPCcX9NLNqto1yT8l+ekkdyX5hyQ/191f3KzPO5Mc1t1vr6pVSV7f3T87LwUzparaL8l+3f2PVfWUJDcmed0Wx/EVSc7o7uPmqUxmqKrWJ5no7invITP6C+2Xk7w6yUuSnN/dL5m7Ctkeo8/Zu5O8pLu/ttn2V8Q5uSBV1f+S5MEkl3X3C0fbPpjkge4+b/QL49O7+9e2eN0+SSaTTCTpbPosflF3f3NO3wBJtnocj0nyme5+tKp+O0m2PI6jfuuzjc9h5s5WjuM5SR7s7t/Zxuum/R2XuTXVsdyi/cNJNnT3B6ZoW58FeE4u9Bm9n0jyle7+f7r7fyb58ySv3aLPa5OsHj2+MsnRVVVzWCPT6O57uvsfR4+/neRLSZ4zv1WxE702mz4ku7s/n+Rpo7DPwnR0kq9uHvJY2Lr7c0ke2GLz5n8Xrk7yuile+sokn+7uB0bh7tNJXrXTCmWbpjqO3f2p7n509PTzSfaf88LYLls5H2diJr/jMoe2dSxH2eKEJH82p0WNaaEHveck+R+bPb8r/z4gPNFn9OG4Icm+c1Id2220tPbHk3xhiuYjquqWqvpkVR0yp4WxPTrJp6rqxqo6dYr2mZy3LByrsvW/uJyTi8ezu/ueZNM/riV51hR9nJuLy1uTfHIrbdN9DjP/3j1agnvJVpZSOx8Xl5cl+UZ337mV9gV5Ti70oDfVzNyWa01n0ocFoKqenOR/T/Ir3f2vWzT/Y5IDu3t5ko8kuXqu62PGjuzuw5P8TJJ3jZY6bM45uUhU1Q8kOT7JJ6Zodk4Oj3Nzkaiqs5M8muTyrXSZ7nOY+fWHSX44yYok9yT58BR9nI+Ly89l27N5C/KcXOhB764kz93s+f5Jvr61PlW1W5K9M7spdHaiqto9m0Le5d39l1u2d/e/dveDo8f/LcnuVfWMOS6TGejur49+/kuSq7Jp+cnmZnLesjD8TJJ/7O5vbNngnFx0vvH4EunRz3+Zoo9zcxEYXSTnuCQn9lYupDCDz2HmUXd/o7s3dvdjSf4kUx8f5+MiMcoXb0jy8a31Wajn5EIPev+Q5OCqOmj0L8+rklyzRZ9rkjx+5bA3ZdOXmP2LyAIyWtf80SRf6u7f3Uqf//D4dyur6iey6c/m/XNXJTNRVU8aXVAnVfWkJMckuX2LbtckOak2eWk2fXH5njkulZnZ6r9QOicXnc3/Ljw5yV9N0eevkxxTVU8fLSU7ZrSNBaKqXpXk15Ic390PbaXPTD6HmUdbfC/99Zn6+Mzkd1wWhv+U5I7uvmuqxoV8Tu423wVsy+iqU+/Opr+Idk1ySXevq6oPJJns7muyKUCsqaqvZNNM3qr5q5itODLJLyS5bbPL0r4nyQFJ0t1/lE0h/R1V9WiSf0uySmBfkJ6d5KrR7/+7Jbmiu/+vqnp78sSx/G/ZdMXNryR5KMkp81Qr21BVe2XT1d5+abNtmx9H5+QCVVV/luQVSZ5RVXcl+d+SnJfkL6rqPyf55yRvHvWdSPL27v7F7n6gqn4jm37BTJIPdLcVMPNkK8fx15PskeTTo8/Zz4+uKv5DSf60u1+drXwOz8NbIFs9jq+oqhXZtBRzfUafs5sfx639jjsPb4GRqY5ld380U3yXfbGckwv69goAAABsv4W+dBMAAIDtJOgBAAAMjKAHAAAwMIIeAADAwAh6AAAAAyPoAQAADIygBwAAMDCCHgAAwMD8/2m6GjQnUUZtAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 1080x576 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "from sklearn.linear_model import Lasso\n",
    "import matplotlib.pyplot as plt\n",
    "lasso=Lasso(alpha=0.001)\n",
    "lasso.fit(x_train,y_train)\n",
    "\n",
    "FI_lasso = pd.DataFrame({\"Feature Importance\":lasso.coef_}, index=x_train.columns)\n",
    "\n",
    "FI_lasso.abs().sort_values('Feature Importance').plot(kind = 'barh',figsize = (15,8))\n",
    "\n",
    "\n",
    "FI_10 = FI_lasso[FI_lasso.abs()['Feature Importance']>1].index\n",
    "print('所选择的特征:',FI_10)\n",
    "#x_train[FI_10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### null importance\n",
    "- Kaggle知识点：Null Importances  https://mp.weixin.qq.com/s/B0-VSkPhkDJkwpllHahJiQ\n",
    "- https://github.com/DLLXW/data-science-competition/tree/main/dc竞赛/公积金贷款逾期预测\n",
    "- https://github.com/xy0210/DCIC-2019-China-Mobile/blob/master/src/null_importance_select.py\n",
    "\n",
    "\n",
    "除了2分类，应该是可以应用到多分类以及回归中的，但是多分类与回归的标签打乱工作如何修改？   \n",
    "测试了一下回归，也没问题。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 第一种实现方式\n",
    "- 分类 癌症预测"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style type='text/css'>\n",
       ".datatable table.frame { margin-bottom: 0; }\n",
       ".datatable table.frame thead { border-bottom: none; }\n",
       ".datatable table.frame tr.coltypes td {  color: #FFFFFF;  line-height: 6px;  padding: 0 0.5em;}\n",
       ".datatable .bool    { background: #DDDD99; }\n",
       ".datatable .object  { background: #565656; }\n",
       ".datatable .int     { background: #5D9E5D; }\n",
       ".datatable .float   { background: #4040CC; }\n",
       ".datatable .str     { background: #CC4040; }\n",
       ".datatable .row_index {  background: var(--jp-border-color3);  border-right: 1px solid var(--jp-border-color0);  color: var(--jp-ui-font-color3);  font-size: 9px;}\n",
       ".datatable .frame tr.coltypes .row_index {  background: var(--jp-border-color0);}\n",
       ".datatable th:nth-child(2) { padding-left: 12px; }\n",
       ".datatable .hellipsis {  color: var(--jp-cell-editor-border-color);}\n",
       ".datatable .vellipsis {  background: var(--jp-layout-color0);  color: var(--jp-cell-editor-border-color);}\n",
       ".datatable .na {  color: var(--jp-cell-editor-border-color);  font-size: 80%;}\n",
       ".datatable .footer { font-size: 9px; }\n",
       ".datatable .frame_dimensions {  background: var(--jp-border-color3);  border-top: 1px solid var(--jp-border-color0);  color: var(--jp-ui-font-color3);  display: inline-block;  opacity: 0.6;  padding: 1px 10px 1px 5px;}\n",
       "</style>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "#null特征筛选\n",
    "from sklearn.model_selection import train_test_split\n",
    "import random\n",
    "import lightgbm\n",
    "def nullImportant(train_df,select_frts):\n",
    "    train_data=train_df[select_frts]\n",
    "    kind=train_df['label']\n",
    "    x_train,x_val,y_train,y_val=train_test_split(train_data,kind,test_size=0.1,random_state=8)\n",
    "    train_matrix = lightgbm.Dataset(x_train, label=y_train)\n",
    "    test_matrix = lightgbm.Dataset(x_val, label=y_val)\n",
    "\n",
    "    params = {\n",
    "        'boosting_type': 'gbdt',\n",
    "        'objective': 'binary',\n",
    "        #'scale_pos_weight':20,\n",
    "        'metrics':'auc',#'binary_logloss',\n",
    "        'num_leaves': 2 ** 6-1,\n",
    "        #'lambda_l2': 10,\n",
    "        'feature_fraction': 0.8,\n",
    "        'bagging_fraction': 0.8,\n",
    "        'learning_rate': 0.05,\n",
    "        'seed': 2020,\n",
    "        'nthread': 8,\n",
    "        'num_class': 1,\n",
    "        'verbose': -1,\n",
    "    }\n",
    "    num_round = 4000\n",
    "    early_stopping_rounds = 500\n",
    "    model = lightgbm.train(params, train_matrix, num_round, valid_sets=test_matrix, verbose_eval=200,\n",
    "                    #feval=tpr_eval_score,\n",
    "                    early_stopping_rounds=early_stopping_rounds\n",
    "                    )\n",
    "    important_normal=sorted(zip(list(x_train.columns), model.feature_importance(\"gain\")),key=lambda x: x[0],reverse=True)\n",
    "    print('finished .......')\n",
    "    #\n",
    "    list_neg=[]\n",
    "    len_round=10\n",
    "    for round_i in range(len_round):\n",
    "        train_data=train_df[select_frts]\n",
    "        re_label=train_df[\"label\"].values\n",
    "        random.shuffle(re_label)\n",
    "        train_df[\"label\"]=re_label\n",
    "        #\n",
    "        kind=train_df['label']\n",
    "        x_train,x_val,y_train,y_val=train_test_split(train_data,kind,test_size=0.1,random_state=8)\n",
    "        train_matrix = lightgbm.Dataset(x_train, label=y_train)\n",
    "        test_matrix = lightgbm.Dataset(x_val, label=y_val)\n",
    "\n",
    "        params = {\n",
    "            'boosting_type': 'gbdt',\n",
    "            'objective': 'binary',\n",
    "            #'scale_pos_weight':20,\n",
    "            'metrics':'auc',#'binary_logloss',\n",
    "            'num_leaves': 2 ** 6-1,\n",
    "            #'lambda_l2': 10,\n",
    "            'feature_fraction': 0.8,\n",
    "            'bagging_fraction': 0.8,\n",
    "            'learning_rate': 0.05,\n",
    "            'seed': 2020,\n",
    "            'nthread': 8,\n",
    "            'num_class': 1,\n",
    "            'verbose': -1,\n",
    "        }\n",
    "        num_round = 400\n",
    "        early_stopping_rounds = 200\n",
    "        model = lightgbm.train(params, train_matrix, num_round, valid_sets=test_matrix, verbose_eval=200,\n",
    "                        #feval=tpr_eval_score,\n",
    "                        #early_stopping_rounds=early_stopping_rounds\n",
    "                        )\n",
    "        #\n",
    "        list_neg.append(sorted(zip(list(x_train.columns), model.feature_importance(\"gain\")),key=lambda x: x[0],reverse=True))\n",
    "    #\n",
    "    print('finished .......')\n",
    "    important_no=[]\n",
    "    vv=list_neg[0]\n",
    "    for i in range(len(vv)):\n",
    "        key=vv[i][0]\n",
    "        value=vv[i][1]/len_round\n",
    "        for j in range(1,len_round):\n",
    "            value+=list_neg[j][i][1]/len_round\n",
    "        important_no.append((key,value))\n",
    "    #\n",
    "    diff_list=[]\n",
    "    diff_list_div=[]\n",
    "    for i in range(len(important_normal)):\n",
    "        diff_list.append((important_normal[i][0],important_normal[i][1]-important_no[i][1]))\n",
    "        diff_list_div.append((important_normal[i][0],important_normal[i][1]/(1e-8+important_no[i][1])))\n",
    "    importance_last=sorted(diff_list,key=lambda x: x[1],reverse=True)\n",
    "    importance_last_div=sorted(diff_list_div,key=lambda x: x[1],reverse=True)\n",
    "    return importance_last\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "#构造数据\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from sklearn.datasets import load_breast_cancer,load_boston\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "X,y = load_breast_cancer(return_X_y = True)\n",
    "X = pd.DataFrame(X,columns=['col_'+ str(i) for i in range(X.shape[1]) ])\n",
    "y = pd.DataFrame(y,columns=['label'])\n",
    "\n",
    "#构建一个训练集和测试集，测试集是没有target的\n",
    "train = pd.concat([X,y],axis=1)\n",
    "test = X.iloc[:10,:]\n",
    "\n",
    "# train + taget\n",
    "# test\n",
    "\n",
    "x_train = X.copy()\n",
    "y_train = y.copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training until validation scores don't improve for 500 rounds\n",
      "[200]\tvalid_0's auc: 0.997531\n",
      "[400]\tvalid_0's auc: 1\n",
      "Early stopping, best iteration is:\n",
      "[41]\tvalid_0's auc: 1\n",
      "finished .......\n",
      "[200]\tvalid_0's auc: 0.66205\n",
      "[400]\tvalid_0's auc: 0.655125\n",
      "[200]\tvalid_0's auc: 0.503968\n",
      "[400]\tvalid_0's auc: 0.515873\n",
      "[200]\tvalid_0's auc: 0.430199\n",
      "[400]\tvalid_0's auc: 0.462963\n",
      "[200]\tvalid_0's auc: 0.566234\n",
      "[400]\tvalid_0's auc: 0.558442\n",
      "[200]\tvalid_0's auc: 0.402703\n",
      "[400]\tvalid_0's auc: 0.418919\n",
      "[200]\tvalid_0's auc: 0.455882\n",
      "[400]\tvalid_0's auc: 0.486765\n",
      "[200]\tvalid_0's auc: 0.385294\n",
      "[400]\tvalid_0's auc: 0.376471\n",
      "[200]\tvalid_0's auc: 0.507407\n",
      "[400]\tvalid_0's auc: 0.545679\n",
      "[200]\tvalid_0's auc: 0.565079\n",
      "[400]\tvalid_0's auc: 0.550794\n",
      "[200]\tvalid_0's auc: 0.480149\n",
      "[400]\tvalid_0's auc: 0.468983\n",
      "finished .......\n",
      "选择前10个特征 ['col_22', 'col_23', 'col_7', 'col_27', 'col_20', 'col_21', 'col_6', 'col_3', 'col_2', 'col_0']\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAtMAAAFlCAYAAAApj2TEAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAjnUlEQVR4nO3de7RkZXnn8e9PiRg0DCqNIYLpNoEEvCFBlAhiJDaMA3IxeBmWQOOaSNImxgSBDqPTZlZmhEkwJLASTQKLEFDMjCRmEIGI0cml1Ua6gQZRUQwgiGhIpkXApp/5o/YZK4dzqVNnV9U+p7+ftWp11d67qp46m+I86z3vfn+pKiRJkiQt3JMmXYAkSZK0VNlMS5IkSUOymZYkSZKGZDMtSZIkDclmWpIkSRqSzbQkSZI0pJ0mXcCwdt9991q5cuWky5AkSdIyd+ONNz5YVStm2rdkm+mVK1eycePGSZchSZKkZS7J12fb5zQPSZIkaUg205IkSdKQbKYlSZKkIdlMS5IkSUOymZYkSZKGZDMtSZIkDclmWpIkSRqSzbQkSZI0JJtpSZIkaUg205IkSdKQbKYlSZKkIdlMS5IkSUPaadIFLEUrz7560iV01l3v+w+TLkGSJGlsHJmWJEmShtRqM51kfZIz5th/YpItSbYnOahv+7OSfCrJ1iQXtlmTJEmSNCrjHpm+FTgB+My07Y8A7wZmbcQlSZKkrhmomU5ycpKbk2xOclmSlUluaLZ9MslzB3mdqrq9qu6YYft3q+rv6DXVc9Xxi0k2Jtn4rW99a5C3lCRJkkZm3mY6yfOB/wy8uqpeDLwD+APg0qp6EXA58PsjrbJRVR+sqoOq6qAVK1aM4y0lSZKkWQ0yMv1q4C+q6kGAqvoOcAhwRbP/MuDQ0ZQnSZIkdZereUiSJElDGmSd6RuAq5KcX1XfTvJM4B+AN9EblT4J+D8jrLFzXEtZkiRJMEAzXVVbkvw28OkkjwM3Ab8CXJLkXcC3gDWDvFmS4+nNt14BXJ1kU1Ud2ey7C9gVeEqS44DVVXXbwj+SJEmSNB6pqknXMJSDDjqoNm7cOLH3NwVxZo7aS5Kk5SbJjVV10Ez7nDMtSZIkDWkkzXSSi5Js6rvdn+TKOY4/IMmG5tiNSQ4eRV2SJElSmwa5AHHBqmpt/+Mk64GtczzlPOC9VXVNktc2j181itokSZKktixqZLqtZESg6F18CPDvgG/M8n4mIEqSJKkzhh6Z7ktG/NmqerBZMu9SesmIlyY5jV4y4nEDvNyvAdcm+R16Df7PznRQVX0Q+CD0LkActnZJkiSpDYsZmW4zGfGXgHdW1d7AO4E/XURdkiRJ0lh0ZTWPU4CPNvf/AvACREmSJHXeYi5AbDMZ8RvA4cDf0hvx/vIi6hoL11OWJEnS0M10m8mIwH8CLkiyE/AI8IvD1iVJkiSNiwmII7KjJiQ6Yi9JkpYbExAlSZKkEWi1mU6yPskZc+z/RJLvJakkdzSJh2uSPCXJJUluadasflWbdUmSJEmjMJIExDm8E9gOfAA4o6o2AiRZC1BVL0yyB3BNkpdW1fYx1ydJkiQNbKCR6baSDqvq9qq6Y4Zd+9NbHYSqegB4CHjCvBQTECVJktQl8zbTfUmHr66qFwPvAP6AXtLhi4DL6SUdLsZm4HVJdkqyCvgZYO/pB1XVB6vqoKo6aMWKFYt8S0mSJGlxBpnm8YSkwySHACc0+y8DzltkHRcD+wEbga/TW6/68UW+piRJkjRS454zPaOq2kZvPjUASf4B+NLkKpIkSZLmN0gz3WbS4YyS7EJvzevvJnkNsK2qblvMa06a6y1LkiQtf/M2020mHSY5nt586xXA1Uk2VdWRwB7AtUm2A/cCbxnq00iSJEljZALiiJiAKEmStDyYgChJkiSNQKsXICZZD2wFVgGvmLb7gmbfenordxzcF9ryQ8CfAAc2Nf1ZVf33NmuTJEmS2jaS1Tyqau1M25PsR29JvQ9M23UisHOTgLgLcFuSD1XVXaOoT5IkSWpDVxIQC3hakp2AHwYeA/51hjpMQJQkSVJndCUB8X8C3wXuA/4J+J2q+s70g0xAlCRJUpcMMjL9hARE4BDgimb/ZcChi6zjYHqJhz9Gb771byR53iJfU5IkSRqprqzm8R+BT1TV96vqAeDvgRmXH5EkSZK6ohMJiPSmdrwauCzJ04CXA7+3yNecKNdbliRJWv7mHZmuqi3AVALiZuB8egmIa5LcTC+t8B2DvFmS45PcQ2+ayNVJrm12XQQ8PckW4PPAJVV184I/jSRJkjRGJiCOiAmIkiRJy4MJiJIkSdIIjDsBcX/gGHrrSN8JrKmqh0xAlCRJ0lI07gTE1cC6qtqW5FxgHXAWJiBKkiRpCRp3AuJ1VbWtebgB2GtqFyYgSpIkaYmZZALiacA1zX0TECVJkrTkTCQBMck5wDZ6jTiYgChJkqQlaOyreSQ5FTgaOKl+sC6fCYiSJElacsaagJjkKOBM4PCqerhvlwmIkiRJWnLmbaarakuSqQTEx4Gb6CUgXpLkXcC3gDUDvt+FwM7A9UkANlTV6fQSEC9pEhCDCYiSJElaAkxAHBETECVJkpYHExAlSZKkERhJM53koiSb+m73J7lygOf9RpJKsvso6pIkSZLaNJYExL6Y8Vkl2RtYTe9iREmSJKnzFjUy3VYyYuP99Fb6mHUStwmIkiRJ6pKhm+k2kxGTHAvcW1Wb5zrOBERJkiR1yWKmeTwhGTHJIcAJzf7LgPPme5EkuwC/SW+KhyRJkrRkdGE1j5+gFyG+OcldwF7AF5L86ESrkiRJkuaxmJHpVpIRq+oWYI+px01DfdDUiPdS5XrLkiRJy9/QzXTLyYiSJEnSkmMC4gjtiCmIjshLkqTlxgRESZIkaQRabaaTrE9yxhz7P5Hke03K4R1NOuKavv3PTbJ1rteQJEmSumIkCYhzeCewHfgAcEZVTZ+ncT5wzZhrkiRJkoYy0Mh0W0mHVXV7Vd0xy3scB3wN2DJHHSYgSpIkqTPmbabbTDqc4z2eDpwFvHeu40xAlCRJUpcMMjL9hKRD4BDgimb/ZcChi6xjPfD+qtq6yNeRJEmSxmbcc6Zn8zLgF5KcB+wGbE/ySFVdONmyJEmSpNkN0ky3knQ4l6o6bOp+kvXA1uXQSLvmsiRJ0vI2bzPdZtJhkuPpzbdeAVydZFNVHTl09ZIkSdIEmYA4Acs5GdHReEmStNyYgChJkiSNwEia6SQXNemGU7f7k1w5x/EvTvKPSW5J8tdJdh1FXZIkSVKbRrKaR1Wt7X88dVHhHE/5E3qJiJ9OchrwLuDdo6hNkiRJasuiRqbbSkYE9gU+09y/Hnj9LO9nAqIkSZI6Y+hmuuVkxC3Asc39E4G9ZzrIBERJkiR1yWJGpttMRjwN+OUkNwI/Ajy2iLokSZKksehEAmJVfRFYDZBkX8D11SRJktR5i2mmW0tGTLJHVT2Q5En0po780SLq6jzXYpYkSVoehm6m20xGBN6cZGoFkI8ClwxblyRJkjQuJiBOyHJNQXTUXZIkLTcmIEqSJEkj0GoznWR9kjPm2H9TkkeSfC/JQ03i4ZokT0lySfN4c5JXtVmXJEmSNArjXs3jLOCGqtqW5FyAqrpkar50Vb0wyR7ANUleWlXbx1yfJEmSNLCBRqbbSjqsquuqalvzcAOwV3N/f3qrg1BVDwAPAU+Yl2ICoiRJkrpk3ma65aTDfqcB1zT3NwOvS7JTklXAzzBDCqIJiJIkSeqSQaZ5PCHpMMkhwAnN/suA8xbypknOAbbRa8QBLgb2AzYCX6e3XvXjC3lNSZIkadzGnoCY5FTgaOCIatbla6Z+vLPvmH8AvjTu2iRJkqSFGKSZbjPp8CjgTODwqnq4b/su9Na8/m6S1wDbquq2BX6WJcX1mCVJkpa+eZvplpMOLwR2Bq5PArChqk4H9gCuTbIduBd4y4I/iSRJkjRmJiBOiAmIkiRJS8PYEhAHCG05McmWJNuTHDRt34uS/GOz/5YkT22zNkmSJKltI7kAMclFwCumbb6A3trSJwAfmHb8TsCfA2+pqs1JngV8fxS1SZIkSW0ZqJlOcjJwBlDAzcC76S1ntzvNnOmq+qep46tq7TyvN33TauDmqtrcPP/bA9YvSZIkTcwkQ1v67QtUkmuTfCHJmbPUYgKiJEmSOmOQOdNPCG0BDgGuaPZfBhy6yDp2al7jpObf45McMf0gExAlSZLUJa1egLgI9wCfqaoHm/WnPw4cOOGaJEmSpDkN0kzfAJzYXBTItNAWWEBoyxyuBV6YZJfmYsTDgWUd2iJJkqSlb6yhLUmOpzffegVwdZJNVXVkVf1zkvOBz9O7yPHjVbU8F2JuuB6zJEnS0mdoiyRJkjSHuUJbRrLOtIa31JMRHXGXJEk7klab6STrga3AKmYObdkKrAf2Aw6uqo3N81YCtwN3NMduqKrT26xNkiRJattIRqZnC21Jsh8zJCA27qyqA0ZRjyRJkjQKAy2Nl+TkJDcn2ZzksiQrk9zQbPtkkucO8jpVdXtV3TH/kZIkSVL3dSUBEWBVkpuSfDrJYbPUYgKiJEmSOqMrCYj3Ac+tqpcAvw5ckWTX6QeZgChJkqQu6UQCYlU9WlXfbu7fCNwJ7DvZqiRJkqS5dSIBMcmKJE9u7j8P2Af46mJeU5IkSRq1TiQgAq8EfivJ94HtwOnNdJIdjus0S5IkLR0mIEqSJElzMAFxCVqqSYiOrEuSpB1JVxIQTwLe1Xfsi4ADq2pTm/VJkiRJbepEAmJVXU5vvWqSvBD4SxtpSZIkdV0XExDfDHx4kNeTJEmSJqlLCYhT3gh8aJZaTECUJElSZ3QlARGAJC8DHq6qW2fabwKiJEmSuqQTCYh93sQso9KSJElS13QiAbF53ScBb8D50pIkSVoiupKACL0UxLuryhhxXK9ZkiRpKTABUZIkSZqDCYhLkAmIkiRJ3TeSCxCTXJRkU9/t/iRXznH8iUm2JNmeZMauX5IkSeqasSQg9sWMz+ZWZkhGlCRJkrpsUSPTY05GlCRJkjpl6GZ6AsmIJiBKkiSpUxYzMj22ZMQpJiBKkiSpS7qWgChJkiQtGYtppseSjChJkiR11dCreYwxGXGH5HrNkiRJ3WcCoiRJkjQHExCXgaWSiOiIuiRJ2pG0egFikvVJzphj/yeSfC9JJbmjSUdck+Q1SW5Mckvz76vbrEuSJEkahXGPTL8T2E4v6fCMqtoIkOQlwDFV9Y0kLwCuBZ4z5tokSZKkBRloZHrUSYdVdVNVfaN5uAX44SQ7D/4xJEmSpPGbt5meQNLh64EvVNWjM9RiAqIkSZI6Y5CR6bElHTaN+7nA22babwKiJEmSuqQzCYhJ9gKuAk6uqjsnXY8kSZI0n0Ga6ZEnHSbZDbgaOLuq/n4xryVJkiSNy7yreYwp6fDtwE8C70nynubw1VX1wII/0TLl+s2SJEndYwKiJEmSNAcTEJeJpZCC6Ai6JEnakYzkAsQkFzXphlO3+5NcOcfx65Pc23f8a0dRlyRJktSmkYxMV9Xa/sdJ1gNb53na+6vqd0ZRjyRJkjQKixqZbisZUZIkSVqKhm6mR5CM+PamCb84yTNmeU8TECVJktQZixmZbjMZ8Q+BnwAOAO4Dfnemg0xAlCRJUpd0IgGxqr5ZVY9X1Xbgj4GDJ12TJEmSNJ/FNNOtJSMm2bPv4fHArYuoS5IkSRqLoVfzaDMZETgvyQFAAXcBbxu2ruXMNZwlSZK6xQRESZIkaQ4mIC4jXU9BdPRckiTtSFq9ALFJMjxjjv03JXkkyfeSPJTkliRr+vY/N8nWuV5DkiRJ6opxj0yfBdxQVduSnAtQVZf07T8fuGbMNUmSJElDGWhkuq2kw6q6rqq2NQ83AHv1vcdxwNeALQv8DJIkSdJEzNtMjyDpcMppNKPQSZ5Ob9T6vfPUYgKiJEmSOmOQkek2kw4BSHIOsI1eIw6wHnh/VW2d63kmIEqSJKlLxr6aR5JTgaOBI+oH6/K9DPiFJOcBuwHbkzxSVReOuz5JkiRpUIOMTLeZdHgUcCbwuqp6eGp7VR1WVSuraiXwe8B/s5GWJElS1807Mt1y0uGFwM7A9UkANlTV6UNVvoNyHWdJkqTuMAFRkiRJmoMJiMuICYiSJEnd0WoznWQ9sBVYBbxi2u4Lmn3rgf2Ag6tqY99z1wFvBR4HfrWqrm2zNkmSJKltIxmZrqq1M21Psh9wAvCBadv3p3dB4/OBHwP+Jsm+VfX4KOqTJEmS2jDuBMTbq+qOGXYdC3y4qh6tqq8BXwEOHvxjSJIkSeM3yQTEfs8B7u57fE+zbXotJiBKkiSpMyaSgDgsExAlSZLUJQNN8xiDe4G9+x7v1WyTJEmSOmusCYhz+BjwpiQ7J1kF7AN8bpGvKUmSJI3UWBMQkxxPb771CuDqJJuq6sjmPT4C3AZsA9a6ksfMXMdZkiSpO0xAlCRJkuZgAuIy1cU0REfOJUnSjmTcCYj7A8cAjwF3Amuq6qEkK4Hbgak1qDdU1elt1iZJkiS1bdwJiKuBdVW1Lcm5wDrgrGb3nVV1wCjqkSRJkkZh3AmI11XVtubhBnpL4EmSJElL0iQTEE8Drul7vCrJTUk+neSwWWoxAVGSJEmdMZEExCTn0FsC7/Jm033Ac6vqJcCvA1ck2XX680xAlCRJUpeMPQExyanA0cBJ1azLV1WPVtW3m/s30rs4cd9x1yZJkiQtxFgTEJMcBZwJvK6qHu7bviLJk5v7z6OXgPjVQT+EJEmSNAljTUAELgR2Bq5PAj9YAu+VwG8l+T6wHTi9mU6iObimsyRJ0mSZgChJkiTNwQTEHdCk0hEdLZckSTuScScgbgXWA/sBB1fVxuZ5BwMfnHoZYH1VXdVmbZIkSVLbxp2AuB9wAvCBabtuBQ5qkhH3BDYn+eu+gBdJkiSpc8adgHh7Vd0xw/aH+xrnpwJLcyK3JEmSdiiTTECc/j4vS7IFuIXeah5PGJU2AVGSJEldMpEExJlU1Wer6vnAS4F1SZ46wzEmIEqSJKkzxp6AOJ+qup3ehYovmHQtkiRJ0lzGmoA4mySrkuzU3P9x4KeBuxbzmpIkSdKojTUBMcnx9OZbrwCuTrKpqo6kN03k7L4ExF+emlai4bjesyRJ0uiZgChJkiTNwQTEHdi4kxAdEZckSTuScScg7g8cAzwG3AmsqaqHmue+iF6Yy670pnq8tKoeabM+SZIkqU3jTkBcDaxrkg7PBdYBZzUXH/458Jaq2txc7Pj9UdQmSZIktWXcCYjX9YWxbAD2au6vBm6uqs3Ncd+uqsdnqMPQFkmSJHXGJBMQTwOuae7vC1SSa5N8IcmZMz3B0BZJkiR1yUQSEJOcA2yj14hDb7rJofTWrD4UOD7JEQt5TUmSJGncxp6AmORU4GjgpPrBunz3AJ+pqger6mHg48CB465NkiRJWohBLkC8AbgqyflV9e1pCYiXsYAExCRHAWcChzdN85RrgTOT7EJvpY/DgfcP/jE0G5eqkyRJGp2xJiACFwI7A9cnAdhQVadX1T8nOR/4PFDAx6tqvAskS5IkSQtkAuIOZtQhLo6ES5Kk5WauBMSxz5mWJEmSlouRNNNJLkqyqe92f5Ir5zj+mUmuT/Ll5t9njKIuSZIkqU0jaaaram1VHTB1A/6I3nzo2ZwNfLKq9gE+2TyWJEmSOm1RzXRbyYjAscClzf1LgeNmeT8TECVJktQZQzfTLScjPruq7mvu3w88e6aDTECUJElSlyxmZLr1ZMTmdYre8niSJElSp3VlNY9vJtkToPn3gQnXI0mSJM1rkATE2bSWjAh8DDgFeF/z718toi7NwXWgJUmS2jN0M91yMuL7gI8keSvwdeANw9YlSZIkjYsJiDugUaYgOvItSZKWGxMQJUmSpBFYzJzpgSVZD2wFVgGvmLb7AmB/4BjgMeBOYE1VPTSO2iRJkqRhjaWZnlJVa2fanmQ1sK6qtiU5F1gHnDXO2iRJkqSF6kQCYlVdV1XbmocbgL1meT8TECVJktQZXUlA7HcacM1MO0xAlCRJUpd0KgExyTnANnqNuCRJktRpY50zPZckpwJHA0fUUl2vT5IkSTuUTiQgJjkKOBM4vKoeXkRNGoBrQUuSJLWjKwmIFwI7A9cnAdhQVacPW5skSZI0DiYg7uDaTkN01FuSJC03JiBKkiRJI9CVBMTnAccC24EHgFOr6hvjqE2SJEkaVlcSEHetqnc3938VeA/gnGlJkiR1WlcSEP+17+HTgBkncpuAKEmSpC7pTAJikt9Ocje9JfXeM9MxJiBKkiSpSzqTgFhV51TV3vSa8Lcvoi5JkiRpLLq4msflwOsnXYQkSZI0n64kIO5TVV9uHh4LfHERdWkBXBdakiRpeF1JQHxfkp+itzTe13ElD0mSJC0BJiCq1RRER7olSdJyYwKiJEmSNAKdSECsqkuS/AqwFngcuLqqzhxHbZIkSdKwupKA+HP0Ljx8cVU9mmSPcdYlSZIkDaMTCYjALwHvq6pHAarqgVnezwRESZIkdUZXEhD3BQ5L8tkkn07y0pkOMgFRkiRJXdKVBMSdgGcCLwfeBXwkSRZRmyRJkjRyXVnN4x7go9XzOXrrTe8+4ZokSZKkOXUiARH4S+DngE8l2Rd4CvDgImrTArg2tCRJ0nC6koB4MXBxkluBx4BTaqmmyUiSJGmHYQKigPZSEB3lliRJy40JiJIkSdIItNpMJ1mf5Iw59t+U5JEk30vyUJJbkqxJ8qwkn0qyNcmFbdYkSZIkjcpYExCBs4AbqmpbknMBmijxpwHvBl7Q3CRJkqTOG2hkuq2kw6q6rqq2NQ83AHs1279bVX8HPDJPHSYgSpIkqTPmbaZbTjrsdxpwzUKeYAKiJEmSumSQkek2kw4BSHIOsI1eIy5JkiQtSeOeM02SU4GjgSNcS1qSJElL2SDNdGtJh0mOAs4EDq+qh4esWSPg+tCSJEkLN28z3XLS4YXAzsD1SQA2VNXpAEnuAnYFnpLkOGB1Vd22sI8jSZIkjY8JiJrTQpMRHeGWJEnLjQmIkiRJ0gi0egFikvXAVmAV8Ippuy8A9geOAR4D7gTWVNVDzXPXAW8FHgd+taqubbM2SZIkqW0jWc2jqtbOtD3JamBdXwLiOuCsJPvTu6Dx+cCPAX+TZN+qenwU9UmSJElt6EQCInAs8OGqerSqvgZ8BTh4hjpMQJQkSVJndCUB8TnA3X377mm2/RsmIEqSJKlLTECUJEmShtSVBMR7gb37Dtur2SZJkiR1VlcSED8GXJHkfHoXIO4DfG7wj6FRcd1oSZKk2XUiAbF5j48At9Gb/rHWlTwkSZLUdSYgqlULTUzsIkfjJUlSPxMQJUmSpBHoRAJikpOAd/Ud+yLgwKra1GZ9kiRJUps6kYBYVZfTLJOX5IXAX9pIS5Ikqeu6koDY783Ah2epwwRESZIkdUZXEhD7vRH40ExPMAFRkiRJXdKpBMQkLwMerqpbF/J6kiRJ0iR0JQFxypuYZVRakiRJ6pquJCCS5EnAG4DDFlC/OsY1miVJ0o6kEwmIzb5XAndX1VcX+BkkSZKkiTABUa1aDgmI4+AIviRJS4cJiJIkSdIIjOQCxCQX8W8TEH8U+HRVvXGW468Efqp5uBvwUFUdMIraJEmSpLaMJQGxL2Z8tuPf2Hfs7wL/Moq6JEmSpDYtappHW8mIfa8Xeit6zLg8ngmIkiRJ6pKhm+kRJSMeBnyzqr48004TECVJktQlixmZbj0ZEXgzhrZIkiRpiRh7AuJskuwEnAD8zKRrkSRJkgaxmGa6tWTExs8DX6yqexZRkybM9ZMlSdKOZOhmuuVkROg14U7xkCRJ0pJhAqJaZQKiNBn+VUiSRscEREmSJGkEWm2mk6xPcsYc+29K8kiS7yV5KMktSdYkOTjJpua2OcnxbdYlSZIkjcK4V/M4C7ihqrYlORegqi5JsgtwULN9T2Bzkr+uqm1jrk+SJEka2EAj020lHVbVdX0N8gZgr2b7w33bnwrMOJHbBERJkiR1ybzN9IiSDgFOA67pe5+XJdkC3AKcPtOotAmIkiRJ6pJBRqZbTzpMcg6wjV4jTvO6n62q5wMvBdYleepCXlOSJEkat7Gv5pHkVOBo4KSaYV2+qrod2Aq8YMylSZIkSQsyyAWIrSUdJjkKOBM4vKoe7tu+Cri7uQDxx4GfBu5a0CdRJ7jWrSRJ2pHM20y3nHR4IbAzcH0SgA1VdTq9aSJnJ/k+sB345alpJZIkSVJXmYCoVpmAKEmSRmGSf/0eWwLiAKEt/yPJF5sl9a5Ksluz/TVJbmxCXG5M8uo265IkSZJGYSQXICa5qC/RcOq2BrgeeEGzpN6XgHXNUx4EjqmqFwKn0JuLLUmSJHXaQAmISU4GzqAXpnIz8G7gYmB3mjnTVfVPU8dX1doBXnYD8AvN8Tf1bd8C/HCSnavq0UHqkyRJkiahM6EtfV4PfGGmRtoEREmSJHVJZ0Jbmu3PB84F3jbT80xAlCRJUpcMNM2jTX2hLUf0h7Yk2Qu4Cji5qu4cd12SJEnSQg0yMn0DcGKSZwFMC22B4UJbXjcttGU34Grg7Kr6+4GrlyRJkiaoK6Etbwd+EnhPkvc0x66uqgcW9Gk0cSYgSpKkHYmhLZIkSdIcxhbaIkmSJO1IRnIBYpKLgFdM23xBVV0yiveTJEmSJmEkzfSAoS2SJEnSkuY0D0mSJGlINtOSJEnSkGymJUmSpCHZTEuSJElDspmWJEmShmQzLUmSJA3JZlqSJEkaks20JEmSNCSbaUmSJGlINtOSJEnSkFJVk65hKEm+BXy9ebg78OAEy9EPeC66w3PRHZ6L7vBcdIvnozs8F3P78apaMdOOJdtM90uysaoOmnQd8lx0ieeiOzwX3eG56BbPR3d4LobnNA9JkiRpSDbTkiRJ0pCWSzP9wUkXoP/Pc9Ednovu8Fx0h+eiWzwf3eG5GNKymDMtSZIkTcJyGZmWJEmSxm5JNdNJ1ie5N8mm5vbavn3rknwlyR1JjuzbflSz7StJzp5M5cufP+fxS3JXklua78LGZtszk1yf5MvNv89otifJ7zfn5+YkB062+qUvycVJHkhya9+2Bf/8k5zSHP/lJKdM4rMsdbOcC39fTECSvZN8KsltSbYkeUez3e/GmM1xLvxutK2qlswNWA+cMcP2/YHNwM7AKuBO4MnN7U7gecBTmmP2n/TnWG43f84T+7nfBew+bdt5wNnN/bOBc5v7rwWuAQK8HPjspOtf6jfglcCBwK3D/vyBZwJfbf59RnP/GZP+bEvtNsu58PfFZM7FnsCBzf0fAb7U/Mz9bnTnXPjdaPm2pEam53As8OGqerSqvgZ8BTi4uX2lqr5aVY8BH26OVbv8OXfHscClzf1LgeP6tv9Z9WwAdkuy5wTqWzaq6jPAd6ZtXujP/0jg+qr6TlX9M3A9cNTIi19mZjkXs/H3xQhV1X1V9YXm/v8Fbgeeg9+NsZvjXMzG78aQlmIz/fbmT0EXT/2ZiN5/HHf3HXNPs2227WqXP+fJKOC6JDcm+cVm27Or6r7m/v3As5v7nqPxWOjP3/MyWv6+mKAkK4GXAJ/F78ZETTsX4HejVZ1rppP8TZJbZ7gdC/wh8BPAAcB9wO9OslZpwg6tqgOBfw+sTfLK/p3V+7udy/VMiD//ifP3xQQleTrwv4Bfq6p/7d/nd2O8ZjgXfjdattOkC5iuqn5+kOOS/DHwv5uH9wJ79+3eq9nGHNvVnrl+/hqRqrq3+feBJFfR+1PcN5PsWVX3NX8qfaA53HM0Hgv9+d8LvGra9r8dQ53LXlV9c+q+vy/GK8kP0WveLq+qjzab/W5MwEznwu9G+zo3Mj2XaXM8jwemrtz+GPCmJDsnWQXsA3wO+DywT5JVSZ4CvKk5Vu3y5zxmSZ6W5Eem7gOr6X0fPgZMXfV+CvBXzf2PASc3V86/HPiXvj+5qj0L/flfC6xO8ozmT62rm21aJH9fTEaSAH8K3F5V5/ft8rsxZrOdC78b7evcyPQ8zktyAL0/D90FvA2gqrYk+QhwG7ANWFtVjwMkeTu9L+CTgYurassE6l7WqmqbP+exezZwVe//lewEXFFVn0jyeeAjSd4KfB14Q3P8x+ldNf8V4GFgzfhLXl6SfIjeyNnuSe4B/gvwPhbw86+q7yT5r/R+WQH8VlUNeiGdGrOci1f5+2IiXgG8BbglyaZm22/id2MSZjsXb/a70S4TECVJkqQhLalpHpIkSVKX2ExLkiRJQ7KZliRJkoZkMy1JkiQNyWZakiRJGpLNtCRJkjQkm2lJkiRpSDbTkiRJ0pD+H4HZmDcaRY+GAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 864x432 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "train_df = train.copy()\n",
    "select_frts = list(set(train_df.columns)-set(['label']))\n",
    "importance_last = nullImportant(train_df,select_frts)\n",
    "\n",
    "importance_col = [item[0] for item in importance_last]\n",
    "importance_value = [item[1] for item in importance_last]\n",
    "\n",
    "\n",
    "plt.figure(figsize=(12,6))\n",
    "plt.barh(importance_col,importance_value)\n",
    "\n",
    "features_selected = importance_col[:10]\n",
    "print('选择前10个特征',features_selected)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 第二种实现方式\n",
    "- 回归 波士顿房价"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "#null特征筛选\n",
    "# from sklearn.model_selection import train_test_split\n",
    "# import random\n",
    "import lightgbm as lgb\n",
    "import tqdm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 开始进行null importance select 方法\n",
    "'''\n",
    "总体思路是观察特征在标签被打乱后是否呈现出比较大的差异性\n",
    "'''\n",
    "def get_feature_importances(X_train, y_train, shuffle=True, seed=None):\n",
    "    if shuffle:\n",
    "    # 这里是否打乱标签的顺序 \n",
    "        y_train = pd.DataFrame(y_train, columns=['信用分']).copy().sample(frac=1.0)\n",
    "\n",
    "    if isinstance(y_train, pd.DataFrame):\n",
    "        X_train_lgb = lgb.Dataset(X_train.values, y_train.values.reshape(-1), free_raw_data=False, silent=True)\n",
    "    else:\n",
    "        X_train_lgb = lgb.Dataset(X_train.values, y_train.reshape(-1), free_raw_data=False, silent=True)\n",
    "\n",
    "    lgb_params = {\n",
    "\n",
    "    'learning_rate': 0.01,\n",
    "    'boosting_type': 'gbdt',\n",
    "    'objective': 'regression_l1',\n",
    "    'metric': 'mae',\n",
    "    'feature_fraction': 0.6,\n",
    "    'bagging_fraction': 0.8,\n",
    "    'bagging_freq': 2,\n",
    "    'num_leaves': 31,\n",
    "    'verbose': -1,\n",
    "    'max_depth': 5,\n",
    "    'lambda_l2': 5, 'lambda_l1': 0\n",
    "\n",
    "    }\n",
    "\n",
    "    lgb_regressor = lgb.train(params=lgb_params, train_set=X_train_lgb, num_boost_round=1000)\n",
    "    imp_df = pd.DataFrame()\n",
    "    imp_df['feature'] = list(X_train.columns)\n",
    "    imp_df['importance_gain'] = lgb_regressor.feature_importance(importance_type='gain')\n",
    "    imp_df['importance_split'] = lgb_regressor.feature_importance(importance_type='split')\n",
    "    return imp_df\n",
    "\n",
    "def nullImportances(x_train,y_train):\n",
    "    \n",
    "    # 未打乱标签时 得到的特征重要性排序\n",
    "    actual_imp_df = get_feature_importances(x_train, y_train.values, shuffle=False)\n",
    "    \n",
    "    null_imp_df = pd.DataFrame()\n",
    "    nb_runs = 80\n",
    "    for i in tqdm.tqdm(range(nb_runs)):\n",
    "        # Get current run importances\n",
    "        imp_df = get_feature_importances(x_train, y_train.values, shuffle=True)\n",
    "        imp_df['run'] = i + 1\n",
    "        # Concat the latest importances with the old ones\n",
    "        null_imp_df = pd.concat([null_imp_df, imp_df], axis=0)\n",
    "        \n",
    "    feature_scores = []\n",
    "    for _f in actual_imp_df['feature'].unique():\n",
    "        # null imp values\n",
    "        f_null_imps_gain = null_imp_df.loc[null_imp_df['feature'] == _f, 'importance_gain'].values\n",
    "        # actual imp values\n",
    "        f_act_imps_gain = actual_imp_df.loc[actual_imp_df['feature'] == _f, 'importance_gain'].mean()\n",
    "        # gain score\n",
    "        gain_score = np.log(1e-10 + f_act_imps_gain / (1 + np.percentile(f_null_imps_gain, 75)))  # Avoid didvide by zero\n",
    "\n",
    "        # null imp split\n",
    "        f_null_imps_split = null_imp_df.loc[null_imp_df['feature'] == _f, 'importance_split'].values\n",
    "        # actual imp split\n",
    "        f_act_imps_split = actual_imp_df.loc[actual_imp_df['feature'] == _f, 'importance_split'].mean()\n",
    "        # split score\n",
    "        split_score = np.log(1e-10 + f_act_imps_split / (1 + np.percentile(f_null_imps_split, 75)))  # Avoid didvide by zero\n",
    "        # result\n",
    "        feature_scores.append((_f, split_score, gain_score))\n",
    "\n",
    "    # 得到真正标签和打乱标签的相对关系\n",
    "    # 打乱标签80次分布的75分位点 和  真正标签的特征重要性 \n",
    "    scores_df = pd.DataFrame(feature_scores, columns=['feature', 'split_score', 'gain_score'])\n",
    "    display(scores_df)\n",
    "    # 设定不同的阈值 进行交并补获得特征差异性\n",
    "    features_selected_1 = scores_df.loc[scores_df['split_score'] > 0.00]['feature'].tolist()\n",
    "    features_selected_2 = scores_df.loc[scores_df['gain_score'] > 1.00]['feature'].tolist()\n",
    "    features_selected_3 = list(set(features_selected_1).union(set(features_selected_2)))\n",
    "    features_selected_4 = list(set(features_selected_1).intersection(set(features_selected_2)))\n",
    "    features_selected = list(set(features_selected_4) - set(features_selected_3))\n",
    "    intersect_features_num = len(features_selected_3)\n",
    "    print('split_score>0:         ',features_selected_1)\n",
    "    print('gain_score>1.0:        ',features_selected_2)\n",
    "    print('split_score+gain_score:',features_selected_3)\n",
    "    print('split_score-gain_score:',features_selected_4)\n",
    "    return features_selected"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "#构造数据\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from sklearn.datasets import load_boston\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "X,y = load_boston(return_X_y = True)\n",
    "X = pd.DataFrame(X,columns=['col_'+ str(i) for i in range(X.shape[1]) ])\n",
    "y = pd.DataFrame(y,columns=['label'])\n",
    "\n",
    "#构建一个训练集和测试集，测试集是没有target的\n",
    "train = pd.concat([X,y],axis=1)\n",
    "test = X.iloc[:10,:]\n",
    "\n",
    "# train + taget\n",
    "# test\n",
    "\n",
    "x_train = X.copy()\n",
    "y_train = y.copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████████████████████████████████████████████████████████████████████████████| 80/80 [00:24<00:00,  3.30it/s]\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>feature</th>\n",
       "      <th>split_score</th>\n",
       "      <th>gain_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>col_0</td>\n",
       "      <td>-0.227374</td>\n",
       "      <td>0.120005</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>col_1</td>\n",
       "      <td>-0.558304</td>\n",
       "      <td>-0.445115</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>col_2</td>\n",
       "      <td>0.014699</td>\n",
       "      <td>0.388706</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>col_3</td>\n",
       "      <td>0.398139</td>\n",
       "      <td>1.002499</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>col_4</td>\n",
       "      <td>0.205128</td>\n",
       "      <td>0.780741</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>col_5</td>\n",
       "      <td>-0.202143</td>\n",
       "      <td>0.917796</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>col_6</td>\n",
       "      <td>-0.252674</td>\n",
       "      <td>0.078728</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>col_7</td>\n",
       "      <td>-0.087960</td>\n",
       "      <td>0.309054</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>col_8</td>\n",
       "      <td>0.139524</td>\n",
       "      <td>0.400176</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>col_9</td>\n",
       "      <td>0.056208</td>\n",
       "      <td>0.512934</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>col_10</td>\n",
       "      <td>0.059551</td>\n",
       "      <td>1.201085</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>col_11</td>\n",
       "      <td>-0.268819</td>\n",
       "      <td>-0.013553</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>col_12</td>\n",
       "      <td>-0.074885</td>\n",
       "      <td>1.255798</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   feature  split_score  gain_score\n",
       "0    col_0    -0.227374    0.120005\n",
       "1    col_1    -0.558304   -0.445115\n",
       "2    col_2     0.014699    0.388706\n",
       "3    col_3     0.398139    1.002499\n",
       "4    col_4     0.205128    0.780741\n",
       "5    col_5    -0.202143    0.917796\n",
       "6    col_6    -0.252674    0.078728\n",
       "7    col_7    -0.087960    0.309054\n",
       "8    col_8     0.139524    0.400176\n",
       "9    col_9     0.056208    0.512934\n",
       "10  col_10     0.059551    1.201085\n",
       "11  col_11    -0.268819   -0.013553\n",
       "12  col_12    -0.074885    1.255798"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "split_score>0:          ['col_2', 'col_3', 'col_4', 'col_8', 'col_9', 'col_10']\n",
      "gain_score>1.0:         ['col_3', 'col_10', 'col_12']\n",
      "split_score+gain_score: ['col_9', 'col_4', 'col_2', 'col_12', 'col_10', 'col_3', 'col_8']\n",
      "split_score-gain_score: ['col_3', 'col_10']\n"
     ]
    }
   ],
   "source": [
    "features_selected = nullImportances(x_train,y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### boruta_py\n",
    "- 特征工程工具总结(4)——boruta_py  https://zhuanlan.zhihu.com/p/93310121"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 根据误差自己迭代 - 本身就是迭代的算法 后向搜索\n",
    "\n",
    "- 这种方式根据评分函数来选择特征，非常容易过拟合"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "def modeling_cross_validation(params, X, y, nr_folds=5):\n",
    "    \n",
    "    oof_preds = np.zeros(X.shape[0])\n",
    "    # Split data with kfold\n",
    "    folds = KFold(n_splits=nr_folds, shuffle=False, random_state=4096)\n",
    "    \n",
    "    for fold_, (trn_idx, val_idx) in enumerate(folds.split(X, y)):\n",
    "        print(\"fold n°{}\".format(fold_+1))\n",
    "        trn_data = lgb.Dataset(X[trn_idx], y[trn_idx])\n",
    "        val_data = lgb.Dataset(X[val_idx], y[val_idx])\n",
    "\n",
    "        num_round = 20000\n",
    "        clf = lgb.train(params, trn_data, num_round, valid_sets = [trn_data, val_data], verbose_eval=1000, early_stopping_rounds = 100)\n",
    "        oof_preds[val_idx] = clf.predict(X[val_idx], num_iteration=clf.best_iteration)\n",
    "\n",
    "    score = mean_squared_error(oof_preds, target)\n",
    "    \n",
    "    return  score/2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "ename": "NameError",
     "evalue": "name 'train' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mNameError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-2-38673036a531>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m     34\u001b[0m     \u001b[1;32mreturn\u001b[0m \u001b[0mbest_cols\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     35\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 36\u001b[1;33m \u001b[0mbest_features\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mfeatureSelect\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mtrain\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mtolist\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;31mNameError\u001b[0m: name 'train' is not defined"
     ]
    }
   ],
   "source": [
    "def featureSelect(init_cols):\n",
    "    params = {'num_leaves': 120,\n",
    "             'min_data_in_leaf': 30, \n",
    "             'objective':'regression',\n",
    "             'max_depth': -1,\n",
    "             'learning_rate': 0.05,\n",
    "             \"min_child_samples\": 30,\n",
    "             \"boosting\": \"gbdt\",\n",
    "             \"feature_fraction\": 0.9,\n",
    "             \"bagging_freq\": 1,\n",
    "             \"bagging_fraction\": 0.9 ,\n",
    "             \"bagging_seed\": 11,\n",
    "             \"metric\": 'mse',\n",
    "             \"lambda_l1\": 0.02,\n",
    "             \"verbosity\": -1}\n",
    "    best_cols = init_cols.copy()\n",
    "    best_score = modeling_cross_validation(params, train[init_cols].values, target.values, nr_folds=5)\n",
    "    print(\"初始CV score: {:<8.8f}\".format(best_score))\n",
    "    for f in init_cols:\n",
    "\n",
    "        best_cols.remove(f)\n",
    "        score = modeling_cross_validation(params, train[best_cols].values, target.values, nr_folds=5)\n",
    "        diff = best_score - score\n",
    "        print('-'*10)\n",
    "        if diff > 0.0000002:\n",
    "            print(\"当前移除特征: {}, CV score: {:<8.8f}, 最佳cv score: {:<8.8f}, 有效果,删除！！\".format(f,score,best_score))\n",
    "            best_score = score\n",
    "        else:\n",
    "            print(\"当前移除特征: {}, CV score: {:<8.8f}, 最佳cv score: {:<8.8f}, 没效果,保留！！\".format(f,score,best_score))\n",
    "            best_cols.append(f)\n",
    "    print('-'*10)\n",
    "    print(\"优化后CV score: {:<8.8f}\".format(best_score))\n",
    "    \n",
    "    return best_cols\n",
    "    \n",
    "best_features = featureSelect(train.columns.tolist())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### SHAP"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "221.654px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
